Skip to content

geotiff: cap VRT XML read size#1818

Merged
brendancol merged 2 commits into
mainfrom
issue-1815
May 13, 2026
Merged

geotiff: cap VRT XML read size#1818
brendancol merged 2 commits into
mainfrom
issue-1815

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1815.

Adds a 64 MiB default cap on VRT XML reads, configurable via XRSPATIAL_VRT_MAX_XML_BYTES. Streams the file with a running size check and raises ValueError when the cap is exceeded.

Tested: pytest xrspatial/geotiff/tests/test_vrt_xml_size_cap_1815.py

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 13, 2026
@brendancol brendancol requested a review from Copilot May 13, 2026 16:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a configurable byte cap (default 64 MiB, via XRSPATIAL_VRT_MAX_XML_BYTES) on VRT XML reads in _vrt.read_vrt to prevent memory exhaustion from pathologically large VRT XML files, addressing issue #1815. Reading is now streamed in 64 KiB chunks with a running size check that raises ValueError when exceeded.

Changes:

  • Add _get_vrt_max_xml_bytes (env-driven cap with validation) and _read_vrt_xml (bounded streaming reader) helpers in _vrt.py.
  • Replace unbounded f.read() in read_vrt with the new bounded reader.
  • Add regression tests covering under-cap success, over-cap failure, env-var override, and invalid cap values.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
xrspatial/geotiff/_vrt.py Introduces env-configurable XML size cap and streaming reader used by read_vrt.
xrspatial/geotiff/tests/test_vrt_xml_size_cap_1815.py New tests verifying cap behavior, env override, and invalid-cap handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: VRT XML parser reads file without a size cap

2 participants